DeepBankPT and Companion Portuguese Treebanks in a Multilingual Collection of Treebanks Aligned with the Penn Treebank

نویسندگان

  • António Branco
  • Catarina Carvalheiro
  • Francisco Costa
  • Sérgio Castro
  • João Ricardo Silva
  • Cláudia Martins
  • Joana Ramos
چکیده

We present a new collection of treebanks for the Portuguese language, comprising five datasets that cover major types of grammatically annotated corpora: TreeBankPT, PropBankPT, DependencyBankPT, LogicalFormBankPT and DeepBankPT. This collection is the Portuguese part of a broader multilingual collection of aligned treebanks that are developed for different languages, including English, under the same methodological principles and guidelines, and whose raw text versions are translations of the Penn Treebank, a de facto standard dataset for research on language technology.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Phrase Tagset Mapping for French and English Treebanks and Its Application in Machine Translation Evaluation

Many treebanks have been developed in recent years for different languages. But these treebanks usually employ different syntactic tag sets. This forms an obstacle for other researchers to take full advantages of them, especially when they undertake the multilingual research. To address this problem and to facilitate future research in unsupervised induction of syntactic structures, some resear...

متن کامل

The Xavier Module – Information Processing of Treebanks

This paper aims to introduce the Xavier module, a program package to process Treebanks (in particular, the Sejong Korean Treebank). In this paper, the procedure of implementing Xavier is discussed, and main usage of the program is also provided. Though this paper focuses on the Sejong Korean Treebank, Xavier is also applicable to other Treebanks, such as the Penn Treebanks, because it has been ...

متن کامل

تولید درخت بانک سازه‌ای زبان فارسی به روش تبدیل خودکار

Treebanks is one of important and useful resource in Natural Language Processing tasks. Dependency and phrase structures are two famous kinds of treebanks. There have already made many efforts to convert dependency structure to phrase structure. In this paper we study an approach to convert dependency structure to phrase structure because of lack of a big phrase structure Treebank in Persian. A...

متن کامل

Converting Dependency Structures to Phrase Structures

Treebanks are of two types according to their annotation schemata: phrase-structure Treebanks such as the English Penn Treebank [8] and dependency Treebanks such as the Czech dependency Treebank [6]. Long before Treebanks were developed and widely used for natural language processing, there had been much discussion of comparison between dependency grammars and context-free phrasestructure gramm...

متن کامل

Exploiting Multiple Treebanks for Parsing with Quasi-synchronous Grammars

We present a simple and effective framework for exploiting multiple monolingual treebanks with different annotation guidelines for parsing. Several types of transformation patterns (TP) are designed to capture the systematic annotation inconsistencies among different treebanks. Based on such TPs, we design quasisynchronous grammar features to augment the baseline parsing models. Our approach ca...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014